Generating Chinese Couplets using a Statistical MT Approach

نویسندگان

  • Long Jiang
  • Ming Zhou
چکیده

Part of the unique cultural heritage of China is the game of Chinese couplets (duìlián). One person challenges the other person with a sentence (first sentence). The other person then replies with a sentence (second sentence) equal in length and word segmentation, in a way that corresponding words in the two sentences match each other by obeying certain constraints on semantic, syntactic, and lexical relatedness. This task is viewed as a difficult problem in AI and has not been explored in the research community. In this paper, we regard this task as a kind of machine translation process. We present a phrase-based SMT approach to generate the second sentence. First, the system takes as input the first sentence, and generates as output an N-best list of proposed second sentences, using a phrase-based SMT decoder. Then, a set of filters is used to remove candidates violating linguistic constraints. Finally, a Ranking SVM is applied to rerank the candidates. A comprehensive evaluation, using both human judgments and BLEU scores, has been conducted, and the results demonstrate that this approach is very successful.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Generating Chinese Couplets and Quatrain Using a Statistical Approach

We propose a novel statistical approach to automatically generate Chinese couplets and Chinese poetry. For Chinese couplets, the system takes as input the first sentence and generates as output an N-best list of second sentences using a phrase-based SMT model. A comprehensive evaluation using both human judgments and BLEU scores has been conducted and the results demonstrate that this approach ...

متن کامل

Bayesian Semi-Supervised Chinese Word Segmentation for Statistical Machine Translation

Words in Chinese text are not naturally separated by delimiters, which poses a challenge to standard machine translation (MT) systems. In MT, the widely used approach is to apply a Chinese word segmenter trained from manually annotated data, using a fixed lexicon. Such word segmentation is not necessarily optimal for translation. We propose a Bayesian semi-supervised Chinese word segmentation m...

متن کامل

Bagging and Boosting statistical machine translation systems

a r t i c l e i n f o a b s t r a c t In this article we address the issue of generating diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Unlike traditional approaches, we do not resort to multiple structurally different SMT systems, but instead directly learn a strong SMT system from a single translation engine in a principled w...

متن کامل

Minimum Bayes-Risk Techniques in Automatic Speech Recognition and Statistical Machine Translation

Automatic Speech Recognition (ASR) and Machine Translation (MT) are fundamental language technologies that are emerging as core components of information processing systems. Each of these problems can be evaluated using a variety of metrics that measure different aspects of recognition or translation performance. In contrast, the training and decoding architectures of most of the current ASR an...

متن کامل

To Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering in Statistical Machine Translation

Reordering poses a major challenge in machine translation (MT) between two languages with significant differences in word order. In this paper, we present a novel reordering approach utilizing sparse features based on dependency word pairs. Each instance of these features captures whether two words, which are related by a dependency link in the source sentence dependency parse tree, follow the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008